Global-knot spline LSQ with matrix-free CGLS by krystophny · Pull Request #150 · itpplasma/libneo

krystophny · 2025-12-02T00:10:02Z

User description

This PR implements a global-knot, matrix-free CGLS-based least-squares spline fitting pipeline for 1D/2D/3D and batch 1D splines.

Highlights:

Introduce and solvers using ISO Fortran 2018 (, ), with OpenMP/SIMD matvecs and explicit dimension checks.
Refactor spline LSQ constructors so the unknowns are global knot values, reusing existing to generate cell coefficients and preserve continuity.
Add LSQ constructors for 1D/2D/3D and a new global-knot LSQ path for batch 1D ().
Add new tests:
- and for the solvers.
- for global-knot CGLS spline reconstruction.
- Extended LSQ coverage in (1D/2D/3D masks) and (batch 1D scattered LSQ).
All tests pass via cmake --build --preset default
[1/9] cd /home/ert/code/libneo/extra/MyMPILib && /home/ert/code/libneo/extra/MyMPILib/Scripts/do_versioning.sh
Versioning MyMPILib...
[2/9] Building Fortran preprocessed extra/MyMPILib/CMakeFiles/MyMPILib.dir/Specific/mpiprovider_module.f90-pp.f90
[3/9] Generating Fortran dyndep file extra/MyMPILib/CMakeFiles/MyMPILib.dir/Fortran.dd
[4/15] Building Fortran object extra/MyMPILib/CMakeFiles/MyMPILib.dir/Specific/mpiprovider_module.f90.o
[5/15] Building Fortran object extra/MyMPILib/CMakeFiles/MyMPILib.dir/Generic/genericWorkunit_module.f90.o
[6/15] Building Fortran object extra/MyMPILib/CMakeFiles/MyMPILib.dir/Generic/initWorkunit_module.f90.o
[7/15] Building Fortran object extra/MyMPILib/CMakeFiles/MyMPILib.dir/Internal/wuMergeWorkunit_module.f90.o
[8/15] Building Fortran object extra/MyMPILib/CMakeFiles/MyMPILib.dir/Internal/wuMergeChunk_module.f90.o
[9/15] Building Fortran object extra/MyMPILib/CMakeFiles/MyMPILib.dir/Internal/wuDataRequester_module.f90.o
[10/15] Building Fortran object extra/MyMPILib/CMakeFiles/MyMPILib.dir/Generic/scheduler_module.f90.o
[11/15] Linking Fortran static library extra/MyMPILib/libMyMPILib.a
[12/15] Building Fortran object test/CMakeFiles/test_mympilib.x.dir/source/derived_scheduler_module.f90.o
[13/15] Linking Fortran executable test/test_arnoldi.x
[14/15] Building Fortran object test/CMakeFiles/test_mympilib.x.dir/source/test_mympilib.f90.o
[15/15] Linking Fortran executable test/test_mympilib.x
cd build && ctest
Test project /home/ert/code/libneo/build
Start 1: test_binsrc
1/44 Test Make MPI optional and minimize dependencies on Fortran version and external libs #1: test_binsrc ....................... Passed 0.01 sec
Start 2: test_boozer_class
2/44 Test Integrate, document and test VMEC interfaces #2: test_boozer_class ................. Passed 0.01 sec
Start 3: test_efit_class
3/44 Test Supply latest version of all magnetic field routine variants #3: test_efit_class ................... Passed 0.01 sec
Start 4: test_geqdsk_tools
4/44 Test Ninjarun: handle Fortran module dependencies #4: test_geqdsk_tools ................. Passed 0.02 sec
Start 5: test_simpson
5/44 Test bdivfree routine: double precision should be changed back to double complex #5: test_simpson ...................... Passed 0.01 sec
Start 6: test_hdf5_tools
6/44 Test Remove aug from repository #6: test_hdf5_tools ................... Passed 2.01 sec
Start 7: test_util
7/44 Test Vmec #7: test_util ......................... Passed 0.01 sec
Start 8: test_cgls_dense
8/44 Test Python VMEC magfie #8: test_cgls_dense ................... Passed 0.01 sec
Start 9: test_cgls_small
9/44 Test Adapting for Marconi Intel configuration #9: test_cgls_small ................... Passed 0.01 sec
Start 10: test_interpolate
10/44 Test Merge changes from magfie #10: test_interpolate .................. Passed 9.06 sec
Start 11: test_batch_interpolate
11/44 Test Move issues from magfie #11: test_batch_interpolate ............ Passed 1.16 sec
Start 12: test_spline_cgls_global
12/44 Test Update build and CI/CD based on magfie #12: test_spline_cgls_global ........... Passed 0.01 sec
Start 13: test_collision_freqs
13/44 Test First eqdsk tests #13: test_collision_freqs .............. Passed 0.01 sec
Start 14: test_transport
14/44 Test extract vacfield (and coil_tools) from MEPHIT #14: test_transport .................... Passed 0.01 sec
Start 15: test_vmec_modules
15/44 Test Consistency check mixes geometric and logical coordinates #15: test_vmec_modules ................. Passed 0.01 sec
Start 16: test_coordinate_systems
16/44 Test Add check for existence of convexwall #16: test_coordinate_systems ........... Passed 0.01 sec
Start 17: test_analytical_circular
17/44 Test Evaluate equilibrium field only if specified #17: test_analytical_circular .......... Passed 0.02 sec
Start 18: test_ascot5_compare
18/44 Test Implement Biot-Savart variant #18: test_ascot5_compare ............... Passed 9.34 sec
Start 19: test_arnoldi_setup
19/44 Test Replace pfile format by mgrid format #19: test_arnoldi_setup ................ Passed 0.22 sec
Start 20: test_arnoldi
20/44 Test Check maybe undefined behavior #20: test_arnoldi ...................... Passed 0.05 sec
Start 21: test_mympilib
21/44 Test Created converter for the conversion between poloidal and toroidal flux lable s_pol and s_tor #21: test_mympilib ..................... Passed 0.05 sec
Start 22: test_system_utility
22/44 Test SPLIME #22: test_system_utility ............... Passed 0.01 sec
Start 23: setup_test_jorek_field
23/44 Test Revised EQDSK class #23: setup_test_jorek_field ............ Passed 0.01 sec
Start 32: test_jorek_field
24/44 Test Add Boozer Fourier transformations #32: test_jorek_field .................. Passed 0.03 sec
Start 33: test_field
25/44 Test Add checks between Fortran and Python EQDSK routines #33: test_field ........................ Passed 0.03 sec
Start 24: cleanup_test_jorek_field
26/44 Test Support CHEASE EQDSK files #24: cleanup_test_jorek_field .......... Passed 0.01 sec
Start 25: test_biotsavart
27/44 Test Support FREEGS EQDSK files #25: test_biotsavart ................... Passed 0.21 sec
Start 26: test_example_field
28/44 Test Updated comments/naming #26: test_example_field ................ Passed 0.01 sec
Start 27: test_biotsavart_field
29/44 Test branch psipol2phitor: Created converter between poloidal and toroidal #27: test_biotsavart_field ............. Passed 0.01 sec
Start 28: test_mesh
30/44 Test Add check for q profile for EQDSK #28: test_mesh ......................... Passed 0.01 sec
Start 29: test_field_mesh
31/44 Test Add routines to convert to harmonics in flux coordinates #29: test_field_mesh ................... Passed 0.01 sec
Start 30: test_polylag_field
32/44 Test Test even order in spline three to five #30: test_polylag_field ................ Passed 0.06 sec
Start 31: test_spline_field
33/44 Test Integration test API #31: test_spline_field ................. Passed 0.27 sec
Start 34: test_stretch_coords
34/44 Test Add splines for covariant B components #34: test_stretch_coords ............... Passed 0.07 sec
Start 35: test_coil_tools_biot_savart
35/44 Test Design MARS interface via OMFIT #35: test_coil_tools_biot_savart ....... Passed 9.04 sec
Start 36: tilted_coil_generate_geometry
36/44 Test Extend EQDSK interface to output B0 #36: tilted_coil_generate_geometry ..... Passed 0.23 sec
Start 37: tilted_coil_fourier_modes
37/44 Test Move all tests that depend on data in /proj/plasma to CODE #37: tilted_coil_fourier_modes ......... Passed 40.29 sec
Start 38: tilted_coil_field_validation
38/44 Test Small workflow changes for development environment #38: tilted_coil_field_validation ...... Passed 4.20 sec
Start 39: test_polylag_5
39/44 Test Activate highest optimization for coil_tools #39: test_polylag_5 .................... Passed 0.01 sec
Start 40: test_poincare
40/44 Test Splime2 #40: test_poincare ..................... Passed 0.02 sec
Start 41: test_odeint_allroutines_context
41/44 Test Fix index errors in 3D splines #41: test_odeint_allroutines_context ... Passed 0.01 sec
Start 42: test_odeint_thread_safety
42/44 Test Add derivatives to splines #42: test_odeint_thread_safety ......... Passed 0.01 sec
Start 43: build_test_golden_record_odeint
43/44 Test 3D splines of second order derivatives #43: build_test_golden_record_odeint ... Passed 0.02 sec
Start 44: test_golden_record_odeint
44/44 Test Improve OpenMP in Biot-Savart code #44: test_golden_record_odeint ......... Passed 0.18 sec

100% tests passed, 0 tests failed out of 44

Label Time Summary:
biot_savart = 9.04 secproc (1 test)
build-helper = 0.02 secproc (1 test)
field = 0.70 secproc (10 tests)
magfie = 53.77 secproc (4 tests)
poincare = 0.02 secproc (1 test)
polylag = 0.01 secproc (1 test)
tilted_coil = 44.72 sec*proc (3 tests)

Total Test time (real) = 76.80 sec (ctest), including new LSQ and CGLS suites.

PR Type

Enhancement

Description

Implement matrix-free CGLS solvers for small and dense systems
- cgls_small.f90: Fixed-size solver for normal equations (ISO Fortran 2018)
- cgls_dense.f90: General dense solver with OpenMP/SIMD optimization and optional weights
Add global-knot LSQ spline constructors for 1D/2D/3D and batch 1D
- Design matrix builders for each dimension using basis function evaluation
- LSQ constructors that solve for knot values via CGLS, preserving spline continuity
Extend test coverage with new CGLS solver tests and LSQ spline validation
- Tests for scattered data interpolation in 1D/2D/3D
- Tests for masked domains (circular, spherical, toroidal)
- Global-knot design matrix exactness verification

Diagram Walkthrough

flowchart LR
  A["Scattered Data<br/>x_data, f_data"] --> B["Design Matrix<br/>build_design_matrix_*d"]
  B --> C["CGLS Solver<br/>cgls_dense_solve"]
  C --> D["Global Knots<br/>y_knots"]
  D --> E["Spline Coefficients<br/>construct_splines_*d"]
  E --> F["Fitted Spline<br/>SplineData*D"]

File Walkthrough

Relevant files

Enhancement

4 files

cgls_small.f90 `New small fixed-size CGLS solver module`	+99/-0
cgls_dense.f90 `New dense matrix CGLS solver with OpenMP/SIMD`	+212/-0
interpolate.f90 `Add LSQ constructors and design matrix builders for 1D/2D/3D`	+405/-2
batch_interpolate_1d.f90 `Add batch LSQ constructor and basis unit allocator`	+117/-1

Tests

5 files

test_cgls_small.f90 `New tests for small CGLS solver on SPD matrices`	+96/-0
test_cgls_dense.f90 `New tests for dense CGLS solver weighted/unweighted`	+97/-0
test_interpolate.f90 `Add LSQ spline tests for 1D/2D/3D scattered and masked domains`	+456/-2
test_batch_interpolate.f90 `Add batch 1D LSQ scattered data interpolation test`	+91/-1
test_spline_cgls_global.f90 `New test for global-knot design matrix CGLS exactness`	+109/-0

Configuration changes

2 files

CMakeLists.txt `Register new CGLS solver modules in build`	+2/-0
CMakeLists.txt `Register new CGLS and global-knot test executables`	+12/-0

qodo-code-review · 2025-12-02T00:10:40Z

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance

🟢

No security concerns identified

No security vulnerabilities detected by AI analysis. Human verification advised for critical code.

Ticket Compliance

🟡

🎫 #1

🟢	Minimize reliance on Fortran features beyond F95 where possible.
🔴	Make MPI entirely optional in low-level routines like arnoldi via preprocessor or remove MPI usage from such routines.
⚪	Reduce dependencies on external libraries where feasible.

🟡

🎫 #4

🔴	Implement a scanner to handle Fortran module dependencies for Ninja dyndep integration.
	Ensure Ninja build uses dyndep for accurate Fortran module build ordering.

🟡

🎫 #3

🔴	Provide latest versions of magnetic field routine variants across multiple coordinate systems and sources.

🟡

🎫 #2

🔴	Integrate, document, and test VMEC interfaces, potentially requiring NetCDF/HDF5.
	Unify VMEC field routines across codes, incorporating specific fixes and features.
	Add automated unit tests for VMEC-related routines and derivatives comparisons.
	Add documentation (API/user docs) for VMEC routines.
⚪	Various TODOs including stopping empty NetCDF creation, tests for derivatives and fields vs libstell, pfUnit automation, coverage, GitHub actions, adding INTENTs, and Doxygen docs.

Codebase Duplication Compliance

⚪

Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance

🟢

Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

⚪

Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status:
No audit logs: The new numerical routines perform computations and error stops without any audit logging,
but given this is a numerical library layer with no sensitive actions, it likely does not
require audit trails; confirm at a higher layer if critical actions need logging.

Referred Code

subroutine cgls_dense_core(phi, rhs, x, max_iter, tol)
    !! Conjugate Gradient for normal equations using
    !! explicit matrix-vector products (unweighted case).
    real(dp), intent(in) :: phi(:,:)
    real(dp), intent(in) :: rhs(:)
    real(dp), intent(inout) :: x(:)
    integer, intent(in) :: max_iter
    real(dp), intent(in) :: tol

    integer :: iter, n_rows, n_cols
    real(dp) :: gamma, gamma_new, alpha, beta, denom
    real(dp) :: rhs_norm

    real(dp), allocatable :: r(:), s(:), p(:), q(:)

    n_rows = size(phi, 1)
    n_cols = size(phi, 2)

    if (size(rhs) /= n_rows) then
        error stop "cgls_dense_core: rhs size mismatch"
    end if


 ... (clipped 109 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status:
Error stops only: New constructors use error stop on input mismatches and empty datasets without contextual
recovery or status returns, which may be acceptable for a library/test context but should
be reviewed if used in production flows expecting graceful handling.

Referred Code

subroutine construct_splines_1d_lsq(x_min, x_max, order, periodic, &
    num_points, x_data, f_data, spl, weights)
    real(dp), intent(in) :: x_min, x_max
    integer, intent(in) :: order
    logical, intent(in) :: periodic
    integer, intent(in) :: num_points
    real(dp), intent(in) :: x_data(:)
    real(dp), intent(in) :: f_data(:)
    real(dp), intent(in), optional :: weights(:)
    type(SplineData1D), intent(out) :: spl

    integer :: n_data, n_keep, i
    real(dp), allocatable :: x_used(:), f_used(:), w_used(:)
    real(dp), allocatable :: phi(:,:), y(:)
    real(dp) :: h_ref, tol_x
    logical :: use_direct

    n_data = size(x_data)
    if (n_data /= size(f_data)) then
        error stop "construct_splines_1d_lsq: x_data and f_data size mismatch"
    end if


 ... (clipped 74 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Update

Compliance status legend

🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

qodo-code-review · 2025-12-02T00:11:53Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
High-level	Avoid large temporary memory allocations To reduce memory usage in the weighted least-squares solver, modify the matrix-vector product routines in `cgls_dense.f90` to apply weights dynamically instead of creating large temporary copies of the design matrix and right-hand side. Examples: src/interpolate/cgls_dense.f90 [190-209] if (present(w)) then allocate(phi_eff(n_rows, n_cols)) allocate(rhs_eff(n_rows)) !$omp parallel do default(shared) private(i, j) do i = 1, n_rows rhs_eff(i) = w(i)rhs(i) !$omp simd do j = 1, n_cols phi_eff(i, j) = w(i)phi(i, j) ... (clipped 10 lines) Solution Walkthrough: Before: subroutine cgls_dense_solve(phi, rhs, x, w, ...) ... if (present(w)) then ! Allocate full copies of phi and rhs allocate(phi_eff(n_rows, n_cols)) allocate(rhs_eff(n_rows)) ! Populate weighted copies do i = 1, n_rows rhs_eff(i) = w(i)rhs(i) do j = 1, n_cols phi_eff(i, j) = w(i)phi(i, j) end do end do call cgls_dense_core(phi_eff, rhs_eff, x, ...) deallocate(phi_eff, rhs_eff) else call cgls_dense_core(phi, rhs, x, ...) end if end subroutine After: subroutine cgls_dense_solve(phi, rhs, x, w, ...) ... ! No large temporary allocations for weights call cgls_dense_core(phi, rhs, x, w, ...) end subroutine subroutine cgls_dense_core(phi, rhs, x, w, ...) ... ! Initial residual calculation if (present(w)) then r = w * rhs else r = rhs end if ! Matrix-vector products apply weights on-the-fly call matvec_phi_t_core(phi, r, s, w) ! Computes s = phi^T * (wr) ... loop call matvec_phi_core(phi, p, q, w) ! Computes q = (wphi) * p ... call matvec_phi_t_core(phi, r, s, w) ! Computes s = phi^T * (w*r) ... end loop end subroutine Suggestion importance[1-10]: 8 __ Why: The suggestion correctly identifies a significant memory allocation issue in `cgls_dense_solve` that could impact performance with large datasets, and the proposed on-the-fly weighting is a valid and impactful optimization.	Medium
Possible issue	Correct the solver's stopping condition Change the relative stopping condition in `cgls_solve_small` to an absolute tolerance check to prevent premature convergence. src/interpolate/cgls_small.f90 [85] -if (r_norm2_new <= tolr_norm2) exit +if (r_norm2_new <= tol) exit Apply / Chat* Suggestion importance[1-10]: 8 __ Why: The suggestion correctly identifies a potentially problematic relative stopping criterion in an iterative solver and proposes a more robust absolute tolerance check, which prevents premature termination and improves correctness.	Medium
General	Refactor duplicated data filtering logic Refactor the two-loop data filtering logic in the new `..._lsq` subroutines into a more efficient single loop by pre-allocating arrays and resizing with `move_alloc`. src/interpolate/interpolate.f90 [242-272] subroutine construct_splines_1d_lsq(x_min, x_max, order, periodic, & num_points, x_data, f_data, spl, weights) ... - n_keep = 0 - do i = 1, n_data - if (.not. periodic) then - if (x_data(i) < x_min .or. x_data(i) > x_max) cycle - end if - if (present(weights)) then - if (weights(i) == 0.0_dp) cycle - end if - n_keep = n_keep + 1 - end do - - if (n_keep == 0) then - error stop "construct_splines_1d_lsq: no usable data" - end if - - allocate(x_used(n_keep), f_used(n_keep)) - if (present(weights)) allocate(w_used(n_keep)) + allocate(x_used(n_data), f_used(n_data)) + if (present(weights)) allocate(w_used(n_data)) n_keep = 0 do i = 1, n_data if (.not. periodic) then if (x_data(i) < x_min .or. x_data(i) > x_max) cycle end if if (present(weights)) then if (weights(i) == 0.0_dp) cycle w_used(n_keep + 1) = weights(i) end if n_keep = n_keep + 1 x_used(n_keep) = x_data(i) f_used(n_keep) = f_data(i) end do + + if (n_keep == 0) then + error stop "construct_splines_1d_lsq: no usable data" + end if + + if (n_keep < n_data) then + call move_alloc(from=x_used(1:n_keep), to=x_used) + call move_alloc(from=f_used(1:n_keep), to=f_used) + if (present(weights)) then + call move_alloc(from=w_used(1:n_keep), to=w_used) + end if + end if ... end subroutine construct_splines_1d_lsq `[To ensure code accuracy, apply this suggestion manually]` Suggestion importance[1-10]: 6 __ Why: The suggestion correctly identifies a repeated and inefficient two-pass filtering pattern across multiple new subroutines and proposes a valid, more performant single-pass approach.	Low
Update

qodo-code-review bot added the Review effort 4/5 label Dec 2, 2025

krystophny temporarily deployed to github-pages December 2, 2025 00:11 — with GitHub Actions Inactive

krystophny temporarily deployed to github-pages December 2, 2025 00:14 — with GitHub Actions Inactive

krystophny added 2 commits January 11, 2026 02:50

interpolate: add global-knot LSQ with CGLS and batch support

61a9eae

interpolate: re-export batch 1D LSQ constructor

29c0fc9

krystophny force-pushed the feature/lsq_scattered_interpolate branch from 570f11a to 29c0fc9 Compare January 11, 2026 01:52

krystophny temporarily deployed to github-pages January 11, 2026 02:00 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Global-knot spline LSQ with matrix-free CGLS#150

Global-knot spline LSQ with matrix-free CGLS#150
krystophny wants to merge 2 commits intomainfrom
feature/lsq_scattered_interpolate

krystophny commented Dec 2, 2025 •

edited by qodo-code-review bot

Loading

Uh oh!

qodo-code-review bot commented Dec 2, 2025 •

edited

Loading

Uh oh!

qodo-code-review bot commented Dec 2, 2025 •

edited

Loading

Examples:

Solution Walkthrough:

Before:

After:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

krystophny commented Dec 2, 2025 • edited by qodo-code-review bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

PR Type

Description

Diagram Walkthrough

File Walkthrough

Uh oh!

qodo-code-review bot commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Compliance Guide 🔍

Uh oh!

qodo-code-review bot commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Examples:

Solution Walkthrough:

Before:

After:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

krystophny commented Dec 2, 2025 •

edited by qodo-code-review bot

Loading

qodo-code-review bot commented Dec 2, 2025 •

edited

Loading

qodo-code-review bot commented Dec 2, 2025 •

edited

Loading